Efficient Algorithms for Mining of High Utility Itemsets
نویسنده
چکیده
--The utility of an itemset represents its importance, which can be measured in terms of weight, value, quantity or other information depending on the user specification. High utility itemsets mining identifies itemsets whose utility satisfies a given threshold. It allows users to quantify the usefulness or preferences of items using different values. Thus, it reflects the impact of different items. High utility itemsets is useful in decision making process of many applications, such as retail marketing and Web service, since items are actually different in many aspects in real applications. One of its popular applications is market basket analysis, which refers to the discovery of sets of items (itemsets) that are frequently purchased together by customers. However, in this application, the traditional model of FIM may discover a large amount of frequent but low revenue itemsets and lose the information on valuable itemsets having low selling frequencies. We propose a novel framework for mining closed+ high utility itemsets (CHUIs), which serves as a compact and lossless representation of HUIs. We propose three efficient algorithms named AprioriCH (Apriori-based algorithm for mining High utility Closed+ itemsets), AprioriHC-FD (AprioriHC algorithm with Fast Discarding unpromising and isolated items) and FCHUD (Fast Closed+ High Utility itemset Discovery) and the integration of closed itemset mining and high utility itemset mining and find possible to develop other compact representations of high utility itemsets inspired by our work to reduce the number of redundant high utility patterns.
منابع مشابه
A New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملAn Efficient Data Structure for Fast Mining High Utility Itemsets
Abstract: High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, there seems to be still a lack of efficient algorithms since these algorithms suffer from either the problem of low efficiency of calculating candidates’ utilities or the proble...
متن کاملMining High Utility Pattern in One Phase without Candidate Generation using up Growth+ Algorithm
Utility mining developed to address the limitation of frequent itemset mining by introducing interestingness measures that satisfies both the statistical significance and the user’s expectation. Existing high utility itemsets mining algorithms two steps: first, generate a large number of candidate itemsets and second, identify high utility itemsets from the candidates by an additional scan of t...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملMining High Utility Itemsets from Large Transactions using Efficient Tree Structure
Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. It is an extension of the frequent pattern mining. Although a number of relevant algorithms have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016